A vine is a graphical tool for labeling constraints in high-dimensional probability distributions. A regular vine is a special case for which all constraints are two-dimensional or conditional two-dimensional. Regular vines generalize trees, and are themselves specializations of Cantor tree.
Combined with bivariate copulas, regular vines have proven to be a flexible tool in high-dimensional dependence modeling. Copulas are multivariate distributions with uniform univariate margins. Representing a joint distribution as the product of univariate margins and copulas allows the separation of the problem of estimating univariate distributions from the problem of estimating dependence. This is handy as univariate distributions can often be adequately estimated from data, whereas dependence information is roughly unknown, involving summary indicators and judgment. Although the number of parametric multivariate copula families with flexible dependence is limited, there are many parametric families of bivariate copulas. Regular vines owe their increasing popularity to the fact that they leverage from bivariate copulas and enable extensions to arbitrary dimensions. Sampling theory and estimation theory for regular vines are well developed and model inference has left the post . Regular vines have proven useful in other problems such as (constrained) sampling of correlation matrices, building non-parametric continuous Bayesian networks.
For example, in finance, vine copulas have been shown to effectively model tail risk in portfolio optimization applications.
An entirely different motivation underlay the first formal definition of vines in Cooke. Uncertainty analyses of large risk models, such as those undertaken for the European Union and the US Nuclear Regulatory Commission for accidents at nuclear power plants, involve quantifying and propagating uncertainty over hundreds of variables. Dependence information for such studies had been captured with Markov trees, which are trees constructed with nodes as univariate random variables and edges as bivariate copulas. For n variables, there are at most n − 1 edges for which dependence can be specified. New techniques at that time involved obtaining uncertainty distributions on modeling parameters by eliciting experts' uncertainties on other variables which are predicted by the models. These uncertainty distributions are pulled back onto the model's parameters by a process known as probabilistic inversion. The resulting distributions often displayed a dependence structure that could not be captured as a Markov tree.
called vines were introduced in 1997 and further refined by Roger M. Cooke, Tim Bedford, and Dorota Kurowicka. An important feature of vines is that they can add conditional dependencies among variables on top of a Markov tree which is generally too parsimonious to summarize the dependence among variables.
Recall that an edge in a tree is an unordered set of two nodes. Each edge in a vine is associated with a constraint set, being the set of variables (nodes in first tree) reachable by the set membership relation. For each edge, the constraint set is the union of the constraint sets of the edge's two members called its component constraint sets (for an edge in the first tree, the component constraint sets are empty). The constraint associated with each edge is now the symmetric difference of its component constraint sets conditional on the intersection of its constraint sets. One can show that for a regular vine, the symmetric difference of the component constraint sets is always a doubleton and that each pair of variables occurs exactly once as constrained variables. In other words, all constraints are bivariate or conditional bivariate.
The degree of a node is the number of edges attaching to it. The simplest regular vines have the simplest degree structure; the D-Vine assigns every node degree 1 or 2, the C-Vine assigns one node in each tree the maximal degree. For large vines, it is clearer to draw each tree separately.
The number of regular vines on n variables grows rapidly in n: there are 2 n−3 ways of extending a regular vine with one additional variable, and there are n( n − 1)( n − 2)!2( n − 2)( n − 3)/2/2 labeled regular vines on n variables .
The constraints on a regular vine may be associated with partial correlations or with conditional bivariate copula. In the former case, we speak of a partial correlation vine, and in the latter case of a vine copula.
where edges with conditioning set are in the edge set of any regular vine . The conditional copula densities in this representation depend on the cumulative conditional distribution functions of the conditioned variables, , and, potentially, on the values of the conditioning variables. When the conditional copulas do not depend on the values of the conditioning variables, one speaks of the simplifying assumption of constant conditional copulas. Though most applications invoke this assumption, exploring the modelling freedom gained by discharging this assumption has begun . When bivariate Gaussian copulas are assigned to edges of a vine, then the resulting multivariate density is the Gaussian density parametrized by a partial correlation vine rather than by a correlation matrix.
The vine pair-copula construction, based on the sequential mixing of conditional distributions has been adapted to discrete variables and mixed discrete/continuous response . Also factor copulas, where latent variables have been added to the vine, have been proposed (e.g., ).
Vine researchers have developed algorithms for maximum likelihood estimation and simulation of vine copulas, finding truncated vines that summarize the dependence in the data, enumerating through vines, etc. Chapter 6 of Dependence Modeling with Copulas summarizes these algorithms in pseudocode.
Truncated vine copulas (introduced by E.C Brechmann in his Ph.D. thesis) are vine copulas that have independence copulas in the last trees. This way truncated vine copulas encode in their structure conditional independences. Truncated vines are very useful because they contain much fewer parameters than regular vines. An important question is what should be the tree at the highest level. An interesting relationship between truncated vines and cherry tree copulas is presented in ( ) Cherry tree graph representations were introduced as an alternative for the usual graphical representations of vine copulas, moreover, the conditional independencies encoded by the last tree (first tree after truncation) is also highlighted here ( ) and in () The cherry tree sequence representation of the vine copulas gives a new way to look at truncated copulas, based on the conditional independence which is caused by truncation.
An implied sampling order is generated by a nested sequence of subvines where each sub-vine in the sequence contains one new variable not present in the preceding sub-vine. For any regular vine on variables there are implied sampling orders. Implied sampling orders are a small subset of all orders but they greatly facilitate sampling. Conditionalizing a regular vine on values of an arbitrary subset of variables is a complex operation. However, conditionalizing on an initial sequence of an implied sampling order is trivial, one simply plugs in the initial conditional values and proceeds with the sampling. A general theory of conditionalization does not exist at present.
|
|